Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring OCR to Mealie for importing scanned recipes #1244

Closed
wants to merge 61 commits into from

Conversation

Miroito
Copy link
Contributor

@Miroito Miroito commented May 19, 2022

Added so far

  • New tab in the recipe creation page for scanned recipes (I'm open for the icon that should be used)
  • As a first draft, the recipe is created with the recognized text in the description field to copy and paste later in edit mode. no more
    2022-07-28 Update: I'm at a point where I think the component is very usable and open for reviews to merge something that is very close to what is already implemented.

Before merging

Here the list of tasks before we consider acceptable to merge this code in the beta

  • Add the experimental feature flag
  • Write tests for each service
    • image_to_string (skipped because tesseract's output is not reliable between distros)
    • image_to_tsv (skipped because tesseract's output is not reliable between distros)
    • format_tsv_output
  • Restrict the possible file types
  • Checkbox make the uploaded picture the recipe thumbnail
  • Nice UI to make the process of importing a recipe like this easier and usable
  • Add possibility to automatically populate single ingredients/steps fields
  • Replace the POST tsv route to take asset names in order instead of files that are already on the server
  • Add a way to return to the ocr-editor page after a recipe was created
  • Remove sample ingredients and steps from the initial recipe creation
  • Tidy up the ui (looking especially at the buttons)
  • Preserve lines or paragraphs
  • Fix the things that makes the CI so angry
  • Write help dialog
  • Bugfix: Mouse position is offset when the page is scrolled down
  • Make all the hard coded English text translatable
  • Add mode to highlight boxes of recognized text on the image Recognized text is highlighted on component mount
  • Fix bug where ratio is not respected on big images
  • Add a quick(er) help for users to get going as fast as possible.

Nice to have's

  • Define the canvas variable pointing to the canvas html element only once instead of every function in the ocr-editor component
  • Add advanced settings (e.g. Language to improve pytesseract recognition)
  • Possibility to add multiple pages/files and switch between them
  • Clean up the ocr-editor page by creating a RecipeOcrEditor component
  • Automatic field filling suggestion
    • Recipe title

Design

This new feature is based on previous experience with a similar software solution called Esker.
The process that I have designed for now lets the user use a new creation page /recipe/create/ocr letting them upload a picture, optionally making it the recipe thumbnail. This creates a recipe called "New OCR Recipe" with the uploaded picture as an asset called "Original recipe image". Additionally, a new column in the recipes table registers that this recipe is an OCR recipe.

The user is directed to the page "recipe/_slug/ocr-editor" where they can use the image they uploaded to fill the usual recipe fields on the right part of the page. When this page in mounted, it sends the asset name to the backend fot it to send back the text and contained inside and its position.

Two modes are available.

  • Selection mode, lets the user input data.
  • Pan and Zoom mode lets the user move around when pictures are big enough to do so.

In selection mode, the user can draw a rectangle, the identified text will appear under the canvas. The user can then select any recipe field on the right, then click anywhere inside the rectangle. This will take whatever text is fully contained in the rectangle and overwrite the field that was last selected.

The bulk add buttons will spawn a dialog with the selected text (understand text under the drawn rectangle) inside them.
This is where the Split text modes come into play, it lets the user choose whether they want to keep all line breaks, for example, if a recipe book lists one ingredient per line, they are able to select the whole list, press bulk add on the ingredient tab and add all ingredients in 2 clicks.

The mode flatten will remove all line breaks and the blocks mode will put line breaks between identified blocks by tesseract. The blocks mode is pretty useful for instructions, that usually come into multiple paragraphs in a form of blocks, making it easier to use the bulk add dialog, this time for instructions.

For recipes that are called New OCR Recipe (n) or regex /New\sOCR\sRecipe(\s\([0-9]+\))?/g, the ocr-editor component will take the biggest block with the fewer words, assume it is the recipe's title, and populate it in the recipe name field. This is done with the function findRecipeTitle in the ocr-editor component.

When the user is happy with the edits the recipe can be saved the usual way.They can come back to the OCR editor page by clicking the usual edit button and using the new button "OCR Editor" that will appear when the recipe is an OCR Recipe (hence the new table column).

@hay-kot
Copy link
Collaborator

hay-kot commented May 22, 2022

FYI Rebasing should get CI sorted.

Need the changes from #1252

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@Miroito Miroito marked this pull request as ready for review July 28, 2022 17:49
@hay-kot
Copy link
Collaborator

hay-kot commented Jul 28, 2022

Problem with your CI was your poetry lock file

CleanShot 2022-07-28 at 14 29 47@2x

Now it's failing for other reasons 😄

@Miroito
Copy link
Contributor Author

Miroito commented Jul 29, 2022

Thanks I missed this line in the feedback.

Yes sounds like the type check is pretty angry at me for being sloppy. I'll get it to work.

@Miroito Miroito changed the title Bring OCR to Mealie for importing scanned recipes Draft: Bring OCR to Mealie for importing scanned recipes Aug 5, 2022
@Miroito Miroito changed the title Draft: Bring OCR to Mealie for importing scanned recipes Draft:Bring OCR to Mealie for importing scanned recipes Aug 5, 2022
@Miroito Miroito changed the title Draft:Bring OCR to Mealie for importing scanned recipes Bring OCR to Mealie for importing scanned recipes Aug 5, 2022
@Miroito Miroito marked this pull request as draft August 5, 2022 17:46
@Miroito Miroito marked this pull request as ready for review August 8, 2022 17:04
Copy link
Collaborator

@hay-kot hay-kot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just my cursory review comments. I haven't done a thorough review and there will likely be more changes required to get this merged in.

Before I dig too deep into this I would like to see a more thorough write-up of the feature overall, how it works.

Some critical areas I see that need more documentation are frontend/pages/recipe/_slug/ocr-editor.vue and some context around the tests that you've written like what is being tested and how we can validate the skipped test if the output is some-what unreliable.

Looks really cool so far, thanks for your work on this one!

.pre-commit-config.yaml Outdated Show resolved Hide resolved
frontend/api/class-interfaces/ocr.ts Outdated Show resolved Hide resolved
mealie/schema/recipe/recipe.py Show resolved Hide resolved
@Miroito
Copy link
Contributor Author

Miroito commented Aug 9, 2022

Added a design section to explain all the relevant work that make this possible hopefully the block of text is not too hard to read.
I want to add a limitations section later on to explain my thoughts on the current implementation and what can be done to make it more stable, reliable and functionally richer.

@Miroito Miroito force-pushed the ocr branch 2 times, most recently from 8e82e38 to a1bdaf4 Compare August 16, 2022 16:07
@Miroito
Copy link
Contributor Author

Miroito commented Aug 16, 2022

Rebased to fix merge conflicts with mealie-next branch

@Miroito
Copy link
Contributor Author

Miroito commented Aug 19, 2022

I have tried using the feature to add a bunch of recipes to see how it does. And... It's outputting gibberish when jpg files include a lot of text which is the worse user experience imaginable. There is either a lot a pre-processing to do that would make it much slower than it is currently.
Or, at least temporarily, I could either restrict the files to png or convert all images to png when they are uploded.

A little bit disappointed with tesseract, I'm going to invesitgate further as to why it is behaving this way,

Putting the PR temporarily in draft again.

@Miroito Miroito marked this pull request as draft August 19, 2022 17:47
@Miroito Miroito marked this pull request as ready for review September 3, 2022 16:17
@Miroito
Copy link
Contributor Author

Miroito commented Sep 3, 2022

I can't keep rebasing this amount of changes everytime so I have marked the PR up for review. Feedback would be greatly appreciated so we can merge this as soon as possible.

@hay-kot
Copy link
Collaborator

hay-kot commented Sep 3, 2022

I have tried using the feature to add a bunch of recipes to see how it does. And... It's outputting gibberish when jpg files include a lot of text which is the worse user experience imaginable. There is either a lot a pre-processing to do that would make it much slower than it is currently.
Or, at least temporarily, I could either restrict the files to png or convert all images to png when they are uploded.

A little bit disappointed with tesseract, I'm going to invesitgate further as to why it is behaving this way,

Putting the PR temporarily in draft again.

Any updates on this comment?

@Miroito
Copy link
Contributor Author

Miroito commented Sep 3, 2022

I have tried using the feature to add a bunch of recipes to see how it does. And... It's outputting gibberish when jpg files include a lot of text which is the worse user experience imaginable. There is either a lot a pre-processing to do that would make it much slower than it is currently.
Or, at least temporarily, I could either restrict the files to png or convert all images to png when they are uploded.
A little bit disappointed with tesseract, I'm going to invesitgate further as to why it is behaving this way,
Putting the PR temporarily in draft again.

Any updates on this comment?

For now, I have restricted the image format to png, I have found no help on Tesseract's side though I did not look too far.

There is also the option of converting any input image to png, then using the png to do the ocr. This adds a huge overhead that would mean I could not afford to ask the server every time to recognize characters in the image when the ocr editor is loaded like it is now.
I think in the long term, it is worth taking a little time to create the recipe and storing every ocr info the database. Creating the recipe the first time will take more time, but overall the recognition can probably be improved by preprocessing and the long processing would happen only once.
You might ask why I did not start by implementing it this way first; it seemed easier to do it the current way, to bring the feature "fast" to the main branch, the rest of the work can be done later as the current implementation is not blocking further improvements of the above nature and this is already a good base.
I used it already a few times these past weeks by taking pictures of my books, creating the recipe on my dev instance and downloading the zip file to upload it to my usual mealie instance, I think it is already more than usable, improvements can come later with user feedback as well.

@Miroito Miroito requested a review from hay-kot September 16, 2022 17:18
@hay-kot hay-kot mentioned this pull request Sep 22, 2022
Copy link
Collaborator

@hay-kot hay-kot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to go through and clean up what was left to get it merged, but there was too much going on in the Vue component for me to dig though and fix everything. Maybe when I have some more time I can go through it and clean up it, but I left some comments on the issues that I'm seeing

The biggest problem with the component is that there is just so much going on and it's difficult to group what belongs together. What I was going to was break the functions and state that go together into separate composable and place those in a different file to allow for logical group of items - maybe related but there were so many typescript errors in VSCode from Volar that it made reviewing extremely difficult, not sure what the issue was there.

btw, you've got a __init.py__ file in the mealie/services/ocr that needs to be fixed.

Like I said, I tried to get it to a good point, but it was more of a weekend project. Will hopefully take another crack at it this weekend if you don't get to it first.

frontend/pages/recipe/_slug/ocr-editor.vue Outdated Show resolved Hide resolved
frontend/pages/recipe/_slug/ocr-editor.vue Outdated Show resolved Hide resolved
frontend/pages/recipe/_slug/ocr-editor.vue Outdated Show resolved Hide resolved
@Miroito
Copy link
Contributor Author

Miroito commented Sep 22, 2022

I tried to go through and clean up what was left to get it merged, but there was too much going on in the Vue component for me to dig though and fix everything. Maybe when I have some more time I can go through it and clean up it, but I left some comments on the issues that I'm seeing
The biggest problem with the component is that there is just so much going on and it's difficult to group what belongs together. What I was going to was break the functions and state that go together into separate composable and place those in a different file to allow for logical group of items - maybe related but there were so many typescript errors in VSCode from Volar that it made reviewing extremely difficult, not sure what the issue was there.

I can do the clean up if you think it should be done before it is merged. One of the reasons I did not do it yet is that actually, most of the script mess are event handlers for the canvas, which means that if I move the canvas to its own component for example, most of the current functions will follow hence just moving the mess. Most of it is math or helper functions to prevent code duplication. I'll try to make it look as best as I can make it look and you let me know what you think.

Least I can do is give it a try, it will be easier for me to clean up since I wrote all those things.

btw, you've got a __init.py__ file in the mealie/services/ocr that needs to be fixed.
Lol yes nice catch, I don't really know what to write inside so I left it empty.

A more general note: Volar is complaining only in the template about the recipe returned by useRecipe because the Recipe type has all fields optional. Honestly this is a more general issue that even the new recipe page is not compliant with. The component prop asks for a NoUndefinedFields<Recipe> but pages/recipe/_slug/index.vue gives it a Recipe anyway. There is just one error reported instead of multiple since the RecipePage component assumes everything is there. This issue alone makes it very difficult to write a proper page using a recipe.
Maybe it would be nicer to have a set of fields that a recipe must have, and a set of truly optional properties that we can check on a case by case basis. I mean even the assets (or ingredients, or instructions) property could be an empty array rather than sending back undefined, the existence of undefined and null is pure pain...

@hay-kot
Copy link
Collaborator

hay-kot commented Sep 25, 2022

Merged as apart of #1670 with a few minor things cleaned up. Thanks for sticking with this one until the end. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants